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Abstract 

Helicobacter pylori has probably infected the human stomach since our origins and subsequently diversified in parallel with 
their human hosts. The genetic population history of H. pylori can therefore be used as a marker for human migration. We 
analysed seven housekeeping gene sequences of H. pylori strains isolated from 78 Senegalese and 24 Malagasy patients and 
compared them with the sequences of strains from other geographical locations. H. pylori from Senegal and Madagascar 
can be placed in the previously described HpAfrical genetic population, subpopulations hspWAfrica and hspSAfrica, 
respectively. These 2 subpopulations correspond to the distribution of Niger-Congo speakers in West and most of 
subequatorial Africa (due to Bantu migrations), respectively. H. pylori appears as a single population in Senegal, indicating a 
long common history between ethnicities as well as frequent local admixtures. The lack of differentiation between these 
isolates and an increasing genetic differentiation with geographical distance between sampling locations in Africa was 
evidence for genetic isolation by distance. The Austronesian expansion that started from Taiwan 5000 years ago dispersed 
one of the 10 subgroups of the Austronesian language family via insular Southeast Asia into the Pacific and Madagascar, 
and hspMaori is a marker for the entire Austronesian expansion. Strain competition and replacement of hspMaori by 
hpAfrical strains from Bantu migrants are the probable reasons for the presence of hspSAfrica strains in Malagasy of 
Southeast Asian descent. hpAfrical strains appear to be generalist strains that have the necessary genetic diversity to 
efficiently colonise a wide host spectrum. 
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Introduction 

The association between Helicobacter pylori and man is very old: 
humans have probably been infected with H. pylori since their 
origins [1]. Like human mitochondrial DNA (mtDNA), the global 
phylogeny of H. pylori sequences consists of two super-lineages. 
The first contains mtDNA haplogroup LO, which is found 
predominantly in hunter-gatherers in southern Africa, the San, 
who are the natural hosts of the H. pylori super-lineage hpAfrica2 . 
The other super-lineage contains the human mtDNA haplogroups 
L1-L6, which correspond to other H. pylori populations [1]. H. 
pylori accompanied modern humans during their migration out of 
Africa about 60,000 years ago, and mirrors the human pattern of 
decreased genetic diversity with distance from Africa [2]. 
Geographical separation plus founder effects have resulted in 
distinct bacterial populations with specific geographical distribu- 
tions [1-9]: hpEurope (present in Europe, the Middle East and 
west and South Asia), hpNEAfrica (northeast Africa), hpAfrical 



(western, central and southern Africa), hpAfrica2 (southern Africa), 
hpAsia2 (northern India, Bangladesh, Thailand and Malaysia), 
hpSahul (in Australian Aboriginals and Papua New Guineans) and 
hpEastAsia with the subpopulations hspEAsia (in East Asians), 
hspMaori (in Taiwanese aboriginals, Melanesians and Polynesians) 
and hspAmerind (in Native Americans). 

The specific geographic distribution and ethnic association of 
the H. pylori genetic populations reflect numerous ancient and 
historic human migrations [10] which established H. pylori 
sequences as a useful genetic marker to solve controversial issues 
in human population history. For example, H. pylori sequences 
revealed an ancient migration from India to continental Southeast 
Asia [9] that had not been discerned from human genetic markers. 
Work on H. pylori from Pacific islanders contributed to resolving 
the dispute about the origins and trajectory of Austronesian 
expansion [8]. According to archaeologists, agriculturists spread 
from Taiwan via insular and coastal Melanesia into the Pacific, as 
marked by the Lapita cultural complex, including red-slipped 
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pottery, Neolithic tools, chickens, pigs and farming [11]. This 
trajectory was supported by linguists, because the topology of the 
language tree of the Austronesian language family is compatible 
with the conclusion that Taiwan is at the origin of the expansion 
[12]. Although some human geneticists favour insular Southeast 
Asia as the source [13], genetic patterns in DNA sequences of the 
H. pylori subpopulation hspMaori provide strong genetic support 
for a Taiwanese origin of Austronesian speakers [8]. H. pylori 
sequences further showed patterns of genetic variability that 
distinguished Buddhists and Muslims, the two major ethnic 
communities in Ladakh in northern India, whereas traditional 
human genetic markers, such as microsatellites and the hyper- 
variable region of mtDNA, were unable to differentiate these 
populations [4]. 

Data on the genetic structure of H. pylori in Africa are scarce 
compared to other world regions [1-3,14]. Yet, this information is 
important for reconstructing human evolutionary history [2,4,8,9] 
and for interpreting geographical differences in the incidence of 
gastric cancer, as genetic background might be a marker for 
virulence factors directly involved in clinical outcome [9]. We 
analysed the genetic diversity of 78 H. pylori strains isolated from 
five ethnic groups in Senegal (West Africa) and 24 strains from 
Madagascar, an island in the western Indian Ocean about 600 km 
off the African coast. We attempted to differentiate the human 
ethnicities on the basis of their H. pylori and to gain additional 
insight into human population history in Africa by studying the 
genetic diversity of H. pylori. 

Materials and Methods 

Strains and Ethics Statement 

Gastroduodenal endoscopy was performed at the gastroenter- 
ology departments of Le Dantec Hospital in Dakar (Senegal) and 
of Befelatanana Hospital in Antananarivo (Madagascar) in 2007 
and 2008, with the permission of the National Ethics Committee 
of Senegal ("Comite National d'Ethique pour la Recherche en 
Sante") (ethics certificate 02612/MSPM/DS/DER) and of the 
National Ethics Committee of the Ministry of Healthand Family 
Planning of Madagascar (ethics certificate 006/SANPF/2007). 
Informed written consent was received from all participants. 

The physician prospectively collected demographic data. All the 
patients were of indigenous origin, and none had received proton 
pump inhibitors or antibiotics during the 4 weeks before 
endoscopy. Three biopsy samples were taken from the antrum 
and three from the fundus during upper gastrointestinal tract 
endoscopy. One biopsy from each site was cultured for H. pylori 
isolation, and the others were fixed and processed for histological 
analysis. 

The strains were supplemented by sequences obtained from the 
H. pylori multi-locus sequence typing web site (http://pubmlst.org/ 
helicobacter/), as published by Falush et al. 2003 [3], Wirth et al. 
2004 [4], Momynaliev el al. 2005 [15], Linz et al. 2007 [2], Devi 
et al. 2007 [6], Tay et al. 2009 [7], Liao et al. 2009 [16], Moodley 
etal. 2009 [10], Breurec et al. 2011 [9] and Moodley et al 2012 
[1]. The new sequences obtained in this study were deposited in 
the web site under the identification numbers 1552-1656. 

Histology 

Gastric biopsy samples were studied after the usual staining, and 
the lesions were classified according to the updated Sydney criteria 
[17] and the Vienna classification for dysplasia [18]. 



H. pylori Isolates and Genomic DNA 

H. pylori was cultured and identified as previously described 
[19]. A single H. pylori colony from the antrum or fundus was 
chosen and subcultured from a primary growth. Genomic DNA 
was extracted with the QIAmp™ kit (Qiagen, Courtaboeuf, 
France). 

Data Analysis 

PCR amplification and sequencing of atpA, efp, mutT, ppa, trpC, 
urel and yphC were performed as previously described [2]. Strain 
population assignment was performed as described by Falush et al 
[3] using the "no admixture model" of Structure [20]. The 
linkage model in Structure was used to estimate the proportion 
of nucleotides derived from each ancestral population, as 
described elsewhere [2,3]. A Clonal Frame analysis was 
performed for the inference of bacterial microevolution with 
100,000 iterations followed by a burn-in period of 50,000 
iterations using the scaled mutational rate 6 set equal to 
Watterson's moment estimator [21,22]. The genealogy was 
sampled every 100 iterations after the initial burn-in phase. This 
analysis was repeated 100 times, and an 80% consensus tree of all 
the sampled genealogies was computed. 

Pairwise F ST values and analyses of molecular variance 
(AM OVA) were calculated in Arleqjjin [23] as described 
previously [24], using the Kimura 2-parameter model previously 
applied to H. pylori sequences [2-4,24]. The significance of the 
pairwise /Yr values was estimated by running 10,000 permuta- 
tions, assuming no difference between the populations. Neighbour- 
joining trees from the pairwise Fgr values were generated in Mega 
v4 [25]. Nucleotide diversity (Jt) was calculated by DnaSP as 
previously described [8]. 

In univariate analysis, the chi-square test was used to compare 
categorical variables. P-values <0.05 were considered to denote 
significant associations. 

Results 

H. pylori Strains from Senegal and Madagascar 

The 158 patients in Senegal included in this study were from the 
Wolof (32.9%), Fulani (15.2%), Mande (12.7%), Serer (10.1%) and 
Tuculor (5.1%) ethnic groups, reflecting the ethnic diversity in 
Senegal. The ethnic origin was based on self-proclaimed 
membership of the ethnic group for at least two generations. 
Thirty-seven (24.1%) biopsies were obtained from patients of 
uncertain ethnicity or who identified themselves as offspring of 
mixed marriages. The infection rate with H. pylori (n=126, 
79.7%), based on histological analysis and bacterial culture, was 
comparable in all ethnicities, ranging from 75.0% in Fulani and 
Tuculor to 93.8% in Serer (difference not significant). The median 
age was 41.7 years (mean, 45.0 years; range, 18-93 years), and 
55% of the patients (n = 87) were male. On the basis of the 
endoscopic findings, 23 patients had gastritis only, 48 had 
ulcerated lesions, and 1 1 had suspicion of neoplasia. All cases of 
suspected neoplasia were histologically confirmed as gastric 
cancer. 

The infection rate among patients from Madagascar was 74.0% 
(54 out of 73) but only 28 biopsies were positive for H. pylori 
because of inadequate laboratory facilities. Patients were classified 
ethnically from their appearance: 60 (82.2%) were recognizable as 
tropical Southeast Asians and 13 (17.8%) as Africans. There was 
no significant difference in infection rate between ethnic groups. 
The patients' median age was 38 years (range, 3-87 years; mean, 
42.0 years), and 59% (n = 43) were male. Endoscopic examination 
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revealed ulcerated lesions in 10 patients (13.7%), while the other 
63 (86.3%) participants had gastritis only. 

Sequencing of the seven housekeeping gene fragments that were 
used in previous analyses [2,8,9] showed unambiguous sequences 
in 78 strains in Senegal and 24 in Madagascar. The concatenated 
sequences (3406 base pairs) yielded 102 unique haplotypes that 
contained 618 polymorphic sites. These haplotypes were com- 
pared with 160 haplotypes from other African countries and about 
1000 haplotypes from other continents. Bayesian clustering 
algorithms implemented in the "no-admixture model" of Struc- 
ture [3] assigned all 102 strains to the population hpAfrical, all 
78 strains from Senegal to the subpopulation hspWAfrica and the 
24 strains from Madagascar to hspSAfrica. HpAfrical strains have 
previously been shown to be composed of 80-90% of the ancestral 
Africal population and 10-20% of the ancestral Europe2 
population [2]. We used the linkage model of Structure to 
estimate the proportion of nucleotides derived from each of the 
previously identified ancestral populations [2,3,8] and found that 
the strains from Senegal and Madagascar were almost pure 
descendants of ancestral Africal. 

A phylogenetic analysis with Clonal Frame [22] on the 
haplotypes of H. pylori strains from Senegal and Madagascar 
confirmed the clustering of the haplotypes into 2 subpopulations, 
hspWAfrica (Senegal) and hspSAfrica (Madagascar). However, no 
substructure was found in either of the subpopulations that 
correlated with the ethnic origin or the gastroduodenal pathology 
(data not shown). 

Lack of Differentiation between H. pylori Strains from 
Different Ethnic Groups in Senegal 

We attempted to differentiate five ethnic groups in Senegal from 
the genetic diversity of their H. pylori strains, as had been done in 
previous analyses of the genetic diversity of H. pyloriin Ladakh in 
northern India [4] and in several ethnicities in Iran [24]. Of the 78 
sequenced strains from Senegal, 25 were isolated from Wolof 
(32.1%), 15 from Fulani (19.2%), 14 from Mande (17.9%), lOfrom 
Serer (12.8%) and 4 from Tuculor (5.1%). The remaining 10 
strains (12.8%) were obtained from descendants of mixed 
marriages or people of unknown ethnic background (Table SI). 
We compared these strains to the 24 strains from Madagascar with 
East Asian (87.5%) or African (12.5%) ancestry and to other, 
previously described strains of the hpAfrical population isolated in 
Burkina Faso (12 strains), Algeria (3 strains), Morocco (5 strains), 
and the Bantu ethnicities of the Xhosa (17 strains) and the 
Northern Sotho in South Africa (23 strains) [3,26]. These strains 
were supplemented with 25 H. pylori isolates from San, hunter- 
gatherers in southern Africa (Table SI) [1]. 

The H. pylori genetic diversity (n) within individual ethnic groups 
in all locations was similar (Table 1). The exceptions were 
hpAfrical bacteria from Madagascar, which showed considerably 
lower genetic diversity {n = 0.0232 vs 0.0269-0.0309), probably 
because of "bottienecks" associated with the migration of Bantu 
from the African east coast carrying H. pylori across the 
Mozambique Channel to Madagascar. In contrast, the genetic 
diversity of the three strains from Algeria was higher (ti = 0.0405), 
possibly because of the few isolates. 

We next analyzed pairwise F S t, a measure of genetic 
differentiation between populations, where an F ST of zero 
indicates no divergence, implying free recombination between 
two populatins, and an F ST of one indicates complete isolation of 
populations. The Fgx values from the ethnic groups in Senegal 
were very low and not significantly differentiated (Table 1). For 
example, the pairwise F &T of 0.0121 in H. pylori from Wolof and 



Serer means that only 1.21% of the genetic variance is due to 
inter-population differentiation whereas 98.8% of the variance can 
be attributed to the intra-population component. Similarly, the net 
between population diversity of H. pylori from Senegal was close to 
zero (Table S2). However, the Senegalese strains significandy 
differed from the Burkina Faso isolates (F ST = 0.05), despite the 
relatively close geographical proximity of the sampling locations, 
their close relation in a Neighbor-joining tree (Figure 1) and their 
low net between population diversity. The isolates from Senegal 
also significantly differed from all other analysed hpAfrical H. 
pylori populations, including those from Morocco and Algeria. In 
general, H. pylori from different geographic origins were differen- 
tiated from each other, which was also reflected in the medium net 
population diversity between most populations. In contrast, H. 
pylori from the same locations such as the H. pylori isolates from 
Malagasy of Southeast Asian and of African descent were not 
differentiated (not shown). 

Struc ture runs of the Senegalese isolates consistendy revealed 
the highest probability for only one population. In runs assuming 
K>\, all isolates were assigned to a single population, and none 
were assigned to other populations in any of the runs for 2^ K^5, 
confirming a lack of difference between H. pylori isolates from 
ethnic groups in Senegal and indicating admixture between the 
bacteria. In contrast, Structure runs on isolates from Senegal 
and from Burkina Faso resulted in two populations, one for each 
country, except for three isolates from each geographical location 
that were assigned to the other. 

Isolation by Distance 

The lack of differentiation between isolates from the same 
geographic sources and the clear differentiation between isolates 
from different geographic locations suggested isolation-by-distance 
between the African isolates, similar to the previously reported 
isolation by distance between H. pylori from globally representative 
geographic sources [2]. However, the latter study was predomi- 
nandy based on non-African H. pylori and involved only few 
isolates from sub-Saharan Africa. Therefore, we plotted the 
genetic distance (Fst) between hpAfrical H. pylori from sub- 
Saharan Africa against the geographic distance (Fig. 2). The 
resulting R 2 of 0.75 showed that 75% of the variance in F^t was 
accounted for by geographic distance, which was considerably 
more than the previously determined 47% for H. pylori from global 
sources [2]. The R 2 increased to 0.79 when the isolates from the 
San were excluded and to 0.88 when isolates from both San and 
Madagascar were excluded. 

Discussion 

Isolation by Distance and Genetic Diversity 

The H. pylori from five different ethnicities from Senegal were so 
similar to each other that we could not differentiate the ethnic 
sources based on their H. pylori genetic diversity. Instead, the H. 
pylori appeared as a single population at this geographic origin, 
indicating a long common history as well as frequent local 
admixture. According to archaeologists, Senegambia was popu- 
lated from the north and east in several migration waves, the last 
being that of the Wolof, the Fulani and the Serer during the end of 
the first millennium AD and the early second millennium AD [27]. 

In contrast to the Senegalese strains, H. pylori from several 
ethnicities from Iran were sufficiendy differentiated that their F ST 
displayed substructure. In addition, some Iranian H. pylori were 
closely related to those from neighboring countries such as Israel, 
Turkey or Uzbekistan [24], reflecting frequent historical contact. 
However, isolates from neighbouring countries of Senegal, 
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Figure 1. Neighbour-joining tree based on pairwise F ST values of hpAfrical haplotypes from sampling locations in Africa. 
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Mauritania to the north and Mali to the east, The Gambia in its 
centre, and Guinea and Guinea-Bissau to the south, are still 
missing to gain additional insights; the only exception is currently 
Burkina Faso. 

Gradual genetic introgression of H. pylori from an outside source 
into one of several culturally and/or ethnically separated 
communities was shown to provide means to distinguish between 
otherwise genetically closely related groups, for example Muslims 
and Buddhists in Ladakh in northern India [4]. While traditional 
human genetic markers such as microsatellites or the hyper- 
variable region of the mtDNA were not able to differentiate 
between these communities, H. pylori sequences showed patterns of 
genetic variability that distinguished Buddhists and Muslims [4]. 
All the Senegalese isolates, however, belonged to the H. pylori 
population hp Africa 1 . Despite the French colonization of Senegal 
for about a century (1854—1960), the isolates showed no traces of 
introgression from other H. pylori populations, such as hpEurope, 
implying only limited, if any, genetic exchange with European 
strains. Thus, the lack of a sufficiently different donor population 
for the introgression of foreign nucleotides essentially restricted the 
Senegalese isolates to the hpAfrical population, and frequent local 
admixture between H. pylori blurred potential signals that would 
have allowed to distinguishing the ethnicities. 

Instead, the lack of differentiation between isolates from Senegal 
and the increasing genetic differentiation with geographical 
distance between sampling locations is evidence for genetic 
isolation by distance. 75% of the variance in F &T was accounted 



for by geographical distance, considerably more than the 47% that 
was estimated for H. pylori from global sources [2]. The R 2 
increased to 0.88 when the isolates from the San and from 
Madagascar were excluded, likely due to the elevated F ST values in 
the pairwise comparisons that involved these sampling locations. 
The higher _F ST were likely caused by the bottlenecks that were 
associated with the colonization of Madagascar from the east 
African coast and by the transmission of hpAfrical from Bantu to 
San [1]. 

Distribution of hpAfrical Strains in Sub-Saharan Africa by 
Niger-Congo Speakers 

The Niger-Congo language family is distributed all over West 
Africa and most of subequatorial Africa [28,29]. All Niger-Congo 
languages of subequatorial Africa belong to a single, low-order 
subgroup of the Niger-Congo language family, Bantu, which 
contains nearly half of the 1532 Niger-Congo languages, while 
most of the 1 76 other subfamilies are confined to West Africa. 
From their homeland in Nigeria and Cameroon, Bantu societies 
expanded east and then south over most of subequatorial Africa, 
distributing a variety of agricultural practices, the Bantu branch of 
the Niger-Congo language family and H. pylori of the hpAfrical 
population. This expansion, which started as early as 4000 BP, 
had reached its southern limit in eastern South Africa by 700 AD 
[29,30]. The short time period since the beginning of the Bantu 
expansion only allowed the development of two closely related H. 
pylori subpopulations, hspSAfrica that resulted from the Bantu 
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Figure 2. Genetic isolation by distance. The genetic distance in H. pylori between pairs of geographical populations (F S t) was plotted against the 
geographical distance between the two populations. 75% of the variance in F ST was accounted for by geographical distance. 
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migrations and hspW Africa that is characteristic for strains from 
West Africa, including Senegal [2]. Due to the slave trade, 
hspWAfrica H. pylori can be found in North and South America, 
particularly at high frequency among African Americans. The 
presence of West African H. pylori in Maghreb (Morocco, Algeria) 
probably reflects contacts between North Africa and the great sub- 
Saharan empires such as those of Ghana (700-1100 AD), Mali 
(800-1550 AD) and Songhai (1300-1600 AD), including the trade 
of gold, salt and slaves [29,31]. 

Strains from Madagascar 

The Austronesian expansion that started from Taiwan 5000 
years ago dispersed one of the 10 subgroups of the Austronesian 
language family along with one of several clades of the H. pylori 
population, hspMaori, via insular Southeast Asia into Melanesia 
and Polynesia, making hspMaori a marker for the entire 
Austronesian expansion [8]. Subsequent migration of a small 
group of Southeast Asian islanders with an effective founding 
population size of 30 women resulted in the settlement of 
Madagascar by Austronesian speakers around 840 AD [32]. They 
were joined around 1000 AD by Bantu migrants crossing the 
Mozambique Channel [33], followed by the arrival of European 
and Chinese settlers and Indian slaves within the past 200 years. A 
complex, largely unknown genetic admixture, involving mainly 
populations of African and Southeast Asian descent, resulted in the 
Malagasy we recognize today, although differences in ethnic 
origins between the highland and coastal regions remain evident 
[34,35]. 

Malaysia and Indonesia, the presumed origins of the first settlers 
of Madagascar, were probably free of H. pylori, as this bacterium 
was introduced into the area only recently by migrants from China 
(hspEAsia) and India (hpAsia2 and hpEurope) [7,9]. Therefore, 
the small number of Austronesian migrants probably arrived in 
Madagascar with few if any H. pylori of the population hspMaori. 
In contrast, the prevalence of H. pylori among Bantu in southern 
Africa is very high, affecting 80-90% of the population [36-38], 
suggesting a strong influx of hpAfrical H. pylori from the Bantu 
migrants. Subsequent strain competition and replacement of 
hspMaori by hpAfrical strains are the probable sources of 
hspSAfrica strains in Malagasy of Southeast Asian descent. 
Migrations from China, Europe and India within the past 200 
years probably introduced other H. pylori populations into the area; 
however, we did not detect these in our study, probably because of 
the small number of strains. Further investigations are needed to 
confirm our findings based on a larger collection of strains. 

Adaptation of hpAfrical Strains toa wide Range of 
Human Hosts 

Adherence of//, pylori to ABO/Lewis b blood group antigens in 
the human stomach is mediated by the blood group antigen- 
binding adhesion BabA [39] . Most South American natives carry 
the O blood group, and the long co-evolution of their indigenous 
H. pylori strains of the hspAmerind population resulted in specialist 
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